35 research outputs found
Incremental Clustering: The Case for Extra Clusters
The explosion in the amount of data available for analysis often necessitates
a transition from batch to incremental clustering methods, which process one
element at a time and typically store only a small subset of the data. In this
paper, we initiate the formal analysis of incremental clustering methods
focusing on the types of cluster structure that they are able to detect. We
find that the incremental setting is strictly weaker than the batch model,
proving that a fundamental class of cluster structures that can readily be
detected in the batch setting is impossible to identify using any incremental
method. Furthermore, we show how the limitations of incremental clustering can
be overcome by allowing additional clusters
Uncovering Group Level Insights with Accordant Clustering
Clustering is a widely-used data mining tool, which aims to discover
partitions of similar items in data. We introduce a new clustering paradigm,
\emph{accordant clustering}, which enables the discovery of (predefined) group
level insights. Unlike previous clustering paradigms that aim to understand
relationships amongst the individual members, the goal of accordant clustering
is to uncover insights at the group level through the analysis of their
members. Group level insight can often support a call to action that cannot be
informed through previous clustering techniques. We propose the first accordant
clustering algorithm, and prove that it finds near-optimal solutions when data
possesses inherent cluster structure. The insights revealed by accordant
clusterings enabled experts in the field of medicine to isolate successful
treatments for a neurodegenerative disease, and those in finance to discover
patterns of unnecessary spending.Comment: accepted to SDM 2017 (oral
A Theoretical Study of Clusterability and Clustering Quality
Clustering is a widely used technique, with applications ranging
from data mining, bioinformatics and image analysis to marketing,
psychology, and city planning. Despite the practical importance of
clustering, there is very limited theoretical analysis of the topic.
We make a step towards building theoretical foundations for
clustering by carrying out an abstract analysis of two central
concepts in clustering; clusterability and clustering quality.
We compare a number of notions of clusterability found in the
literature. While all these notions attempt to measure the same
property, and all appear to be reasonable, we show that they are
pairwise inconsistent. In addition, we give the first computational
complexity analysis of a few notions of clusterability.
In the second part of the thesis, we discuss how the quality of a
given clustering can be defined (and measured). Users often need to
compare the quality of clusterings obtained by different methods.
Perhaps more importantly, users need to determine whether a given
clustering is sufficiently good for being used in further data
mining analysis. We analyze what a measure of clustering quality
should look like. We do that by introducing a set of requirements
(`axioms') of clustering quality measures. We propose a number of
clustering quality measures that satisfy these requirements
Towards Theoretical Foundations of Clustering
Clustering is a central unsupervised learning task with a wide variety of applications. Unlike in supervised learning, different clustering algorithms may yield dramatically different outputs for the same input sets. As such, the choice of algorithm is crucial. When selecting a clustering algorithm, users tend to focus on cost-related considerations, such as running times, software purchasing costs, etc. Yet differences concerning the output of the algorithms are a more primal consideration. We propose an approach for selecting clustering algorithms based on differences in their input-output behaviour. This approach relies on identifying significant properties of clustering algorithms and classifying algorithms based on the properties that they satisfy.
We begin with Kleinberg's impossibility result, which relies on concise abstract properties that are well-suited for our approach. Kleinberg showed that three specific properties cannot be satisfied by the same algorithm. We illustrate that the impossibility result is a consequence of the formalism used, proving that these properties can be formulated without leading to inconsistency in the context of clustering quality measures or algorithms whose input requires the number of clusters.
Combining Kleinberg's properties with newly proposed ones, we provide an extensive property-base classification of common clustering paradigms. We use some of these properties to provide a novel characterization of the class of linkage-based algorithms. That is, we distil a small set of properties that uniquely identify this family of algorithms.
Lastly, we investigate how the output of algorithms is affected by the addition of small, potentially adversarial, sets of points. We prove that given clusterable input, the output of -means is robust to the addition of a small number of data points. On the other hand, clusterings produced by many well-known methods, including linkage-based techniques, can be changed radically by adding a small number of elements
Co-Creative Songwriting for Bereavement Support
Self-expression is essential to processing our thoughts and feelings and is central to successful mental health therapy. Art therapy provides a wider range of expressive mechanisms than offered through traditional approaches, allowing individuals to process their emotions when traditional therapies prove unsuccessful. Yet, effective expression through art therapy may call on a level of artistic experience that is not available to all. As such, a lack of expertise or comfort with artistic expression may hinder one’s ability to receive needed mental health support. Creative machines can offer novel therapeutic approaches by offloading the need for creative expertise and opening up creative self-expression to those who lack the corresponding experience. In this paper, we focus on bereavement, and explore a co-creative songwriting system, ALYSIA, as a new form of therapy for those who had recently suffered the loss of a loved one. We evaluate the utility of this creative system in aiding bereaved individuals through several case studies. In addition, we discuss the utility of co-creative systems to the therapeutic context with potential application to a broad range of therapies